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Abstract: Maxima of moving maxima of continuous functions (CMS) are 
max-stable processes aimed at modeling extremes of continuous phenomena 
over time. They are defined as Smith and Weissman's M4 processes with 
continuous functions rather than vectors. After standardization of the mar- 
gins of the observed process into unit-Frechet, CMS processes can model 
the remaining spatio-temporal dependence structure. 

CMS processes have the property of joint regular variation. The spectral 
processes from this class admit particularly simple expressions. Further- 
more, depending on the speed with which the parameter functions tend to- 
ward zero, CMS processes fulfill the finite-cluster condition and the strong 
mixing condition. For instance, these three properties put together have 
implications for the expression of the extremal index. 

A method for fitting a CMS to data is investigated. The first step is to 
estimate the length of the temporal dependence. Then, by selecting a suit- 
able number of blocks of extremes of this length, clustering algorithms are 
used to estimate the total number of different profiles. The number of pa- 
rameter functions to retrieve is equal to the product of these two numbers. 
They are estimated thanks to the output of the partitioning algorithms in 
the previous step. The full procedure only requires one parameter which is 
the range of variation allowed among the different profiles. The dissimilarity 
between the original CMS and the estimated version is evaluated by means 
of the Hausdorff distance between the graphs of the parameter functions. 

AMS 2000 subject classifications: Primary 60G70; secondary 60G60. 
Keywords and phrases: CMS, M4, extremes, clusters, spectral process, 
extremal index. 



1. Introduction 

Maxima of moving maxima of continuous functions (CMS) are the analogue of 
Smith and Weissman's M4 processes [15] with continuous functions rather than 
vectors. Let ap-* {i G Z+, j G Z) be strictly positive, real, continuous functions 
on a compact domain of say [0, 1]''. The functions a'f ^ are the parame- 
ter functions. They are assumed to satisfy, for every x G [0,1]', the equality 
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Sjez Si^o '^i i^) = 1- A CMS process {Xt)t£Z is defined by the expression 
Xtix) = sup sup a'f^ {x)Z^y {x e [0, 1]"), 

where the innovations Z^p [i e Z, j e Z) are independent and identicaUy 
distributed unit-Frechet random variables, i.e. P(Zt ^ z) = exp(— 1/z) for z > 
0. 

The fact that, given real numbers > (i G N) such that ^1+^2 + - • • = 1, the 
distribution of max(^iZi, f 2-^27 •■ •) stays unit-Frechet implies that Xt has unit- 
Frechet margins. However the transformation from (Zt)t<i% to [Xt)t&L induces a 
dependence structure in time and space. Extremes appear in temporal clusters 
and, at time i, a large value for Xt at location x causes large values at other 
locations. From this fact, CMS processes are able to model a wide range of 
spatio-temporal dependences. The first part of this paper is a study of some 
properties: spectral process, strong mixing condition, finite-cluster condition 
and extremal index. 

The second objective of this paper is to fit CMS processes to samples with 
measurement errors. For that purpose, CMS will be discretized into M4 of di- 
mension D selecting D points Xd {\ ^ d ^ D) in the domain. It will be also 
assumed that ^ i < A' and 1 ^ j ^ L for finite constants K and L. The 
practical model studied is thus 

Xt{d) = max max a^^ {xd)!!^^ + et{xd) {1 d ^ D) 

where et{xd) are independent N{0,a'^) random variables. The parameter K is 
the length of the temporal dependence and L is the total number of reproducible 
patterns that we can observe up to a multiplicative constant in the process. 
Figure 1 shows a realization of a CMS plotted versus a M4. 




Fig 1. M4 with D = 5 on the left, CM3 on [0, 1] on the right (K = 3, L = 2). 



In Section 2, a coherent set of properties for CMS is established. The moti- 
vation is similar as in [IS, 14] but now for random continuous functions. Theo- 
rem 2.3 is the joint regular variation of those processes, a concept extended to 
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Banach spaces in [9] . The spectral process of a CMS has a discrete distribution, 
given by the theorem. Next, depending on the speed with which the parameter 
functions a-p tend toward zero. Theorem 2.4 yields the finite-cluster condition 
and Theorem 2.5 yields the strong mixing condition. These three properties to- 
gether also have specific implications, for instance the inverse of the extremal 
index 9 becomes the expected size of clusters of extremes in the sense of [12]. 

CMS processes are also examples of max-stable random fields [1, 2]: for every 
finite space-time subset A x T C [0, 1]' x Z, the random vector {Xt{x))x£A.t£T 
has a multivariate extreme value distribution. This property of M4 is inherent 
to CMS since the law of a continuous random field is characterized by its finite 
dimensional distributions that are M4 according to Example 2.2. 

Section 3 is a preparation for the estimation of the parameter functions. 
Extremes will play a central role in identifying the recursive patterns and their 
relative frequencies. So we need to study the probabilistic properties of the 
blocks of extremes that can be observed in CMS. The harmonic mean makes 
convenient the expressions of the frequencies of the reproducible patterns that 
can be observed. 

In Section 4, we suggest and compare empirical methods to estimate 
L and the parameter functions without assumptions on these objects. It is a 
complement to [23], where the case L ^ 1 has only been treated, and to [22], 
where assumptions are made on the parameters. 

This study is designed to improve the statistical analyses of extreme events, 
as done in ]17, 18] for instance. 

2. Definition and properties 

Choose a nonempty compact domain of R''. To not multiply the notations this 
compact will be taken to be [0,1]*. Given an array Zp'' {i G Z+, j G Z) of 
independent unit-Frechet random variables, if ap' : [0,1]'' W*^ are determin- 
istic strictly positive continuous functions, a CMS process is a stochastic process 
defined by 

Xtix) = supsup(apH2;)Zj(i\). (2.1) 

If furthermore 

(VxG[0,l]''):^^apH^) = l 

we say that {Xt)t(^z is a standard CMS process. 

The first result is an imperative condition before any use of CMS processes. 
Recall that the sup-norm of a function / : [0, 1]'^ — ^ M is ||/||oo = supj.gjQ ;^], \f{x)\ 
and that this supremum is achieved. 

Proposition 2.1. // 

^^||ap)||oo<oo, (2.2) 
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then, for every t d Z,, Xf in (2.1) is a random element in "^([0, 1]'^,R+) and the 
process {Xt)tez is stationary. 

The proofs of the resuhs of this section are relegated to Appendix A. 

Example 2.2. // {Xt)tez is a standard CMS with ^ i < K and 1 j ^ L, 
then for eachxi, . . . ,xd G R*, the process (Xt{xi), . . . , Xt{xD))tez, is a standard 
M^. Under (2.2) ([jX^ ||oo)tez is a non-standard MS. In both cases ^ i < K 
and 1 ^ J ^ L. This is helpful for the estimation of K and L if xi,. . . ,X]j are 
far enough, see 4-2. 

A CMS process is an example of a jointly regularly varying time series. In 
particular, there exists a process (6t)jg^ in "^([0, 1]*,R+), called spectral process 
which is the limit in distribution, as a; — > oo, 

^((X,/||Xo||oo)tez I ll^oll^a;) ^ ^ {{Q,)^^^) 

in the proper product space. According to [9], this process captures all aspects 
of extremal dependence, both within space and over time. 

Theorem 2.3. Setting ap-* =0 ifi<0, under condition (2.2), a CMS process 
{Xt)t£Z is jointly regularly varying with index a = 1 and spectral process 

where (/, J) is a random vector on Z+ x Z having distribution 

P[(/, J) = {^,J)] = " ' , ^ e Z+, J e Z. 

All CMS processes satisfying (2.2) also satisfy the finite-cluster condition. 
This property prevents a sequence of extremes occurring in a CMS from being 
infinite over time even ifi4r = +ooorL = +oo. 

Theorem 2.4. Under condition (2.2), a CMS process {Xt)tez satisfies the 
finite- cluster condition: there exists (r„)„gN with r„ ^> oo and r„/n — )■ .such 
that 

lim hmsupP( max ||A'i||oo > | ||Xo||cx) > ?i) = 0. (C) 

Together with the finite-cluster condition, the strong mixing property leads 
to nice properties. To obtain the strong mixing property a sufficient condition 
is 

Note that (2.2) and (2.S) are trivial whenever K < +oo and L < +oo. 
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Theorem 2.5. Under condition (2.3), a CMS process (Xt)t^z satisfies the 
strong mixing condition: 



lim sup \P{A r\B)- P{A)P{B) \ = 

rn^oo A£a-(-oc,-m) 
B(Ecr(m,oo) 

where a{r, s) is the a- field generated by {Xf \ r ^ t ^ s}. 



(M) 



If {Xt)t£Z is a regularly varying time series with index 1, the extremal index 
9 of the univariate time series (||Xt||oo)tGZ is defined as the quantity between 
and 1 such that 



P{ max \\Xt\\oo ^ nx) e 



-e/x 



as n — > CXI. The extremal index of a CMS process is the following. 

Proposition 2.6. Under condition (2.2), if {Xt)tez is a CMS process, the 
extremal index of {\\Xt\\oo)t£Z is 



E 



max 1 1 a 

i>0 



(i)i 



Once conditions (C) and (M) are satisfied, which is the case under (2.3) 
by Theorem 2.4 and Theorem 2.5, there are further characterizations of the 
extremal index such that 



lim lim P( max \\Xi\\oo ^ x \ ||Xo||oo > x) 

t—>OG X—>-OG i — l t 



(2.4) 



and, in this case, 1/6* is the expected size of clusters of extremes in the sense of 
[12], which is recalled as follows. Let u„ oo be a thresholding sequence and 
r„ — oo be such that the expected number of exceedances in a sample of size 
r„ tends toward 0: 



E 



-r-„P(||Xi||oo >M„)^0. 



Then, denoting M„ := max{||Xi| 



E 



, ll-'frilloo}, under (C) and (M), we have 

rnP{\\Xi\\^ > Un) 1 



as n — oo. 



3. Block profiles 



In this section we study further probabilistic features of CM3 processes in order 

(j) 

to build a method to estimate the parameter functions a] in the case ^ i < K 
and 1 ^ j ^ L. The theoretical model for the rest of the paper is thus 



Xt(x)^max max 4^'Va;)Z}i^, (a;e[0,l]«) 

l^isJLO^i<if 



A3) 



(3.1) 
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The estimation method suggested in this paper is based on the fact that a 
large value of Zp-* causes large values ot Xt ioi i ^ t < i + K and the possibility 
to have in this case 



By "block profile" we mean a sequence {Xt, . . . , Xt+K~i) satisfying (3.2) for 
some 

1 ^ j ^ L. The corresponding sequence of functions (oq"''' , . . . , a^^_-^) will be 
called "profile" or "pattern". 

In 3.1 we compute the probability of the events (3.2) and their frequencies 
of occurrence for the different values of j. In 3.2 we have a brief look at the 
correlation between all the possible blocks of length K available in a sample. 
They are not independent if they overlap. In 3.3 we give the needed sample size 
to expect that an event of the form (3.2) realizes at least once. 

To compute the exact values, the knowledge of the parameter functions is 
needed, which is particular not the case in the estimation. This is the reason 
why we also give lower and upper bounds for the true values. These bounds 
only depend on a unique parameter C, which is the maximal variation among 
the parameter functions a- . 

3.1. Relative frequencies 

To recover the functions a'f^ from a sample of size T, the first step is to under- 
stand how (3.1) works. Consider as an example a simple situation when K — 3 
and L — 2. A finite number of functions a\''^ is uniformly bounded below by 

(2) 

a positive constant since they are strictly positive. Thus, if for instance ZJ is 
large enough, the value of {Xi, . . . , Xk) at a given position x E [0, 1]'' is 



{Xt,...,Xt+K-i)^Zl'>{a\ 



. . . , u^_l^ 



(3.2) 




a['\x)Z, 
af\x)Z, 



'0 



'0 



(1) 






(3.3) 



so that the second pattern appears: 



{X^,X2,X:i){x) = {af ,a[^\a^^\x)z[ 
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How likely is this kind of events to occur? To compute their probabilities, first 
remark that (3.3) is equivalent to 



'7(1) / • 
Z_-[ ^ mm 



Zq ^ mm 



'7(1) ^ • 
^ mm 



Z2 ^ mm 



'7(1) ^ ■ 
Z3 ^ mm 



'■2 

,(2) 



,(2) 



4'^ 

,(2) 



4'; 

,(2) 



,(1) 



(2)/ 

a\ '(X 



(2)/ 

a\ {x 



(2)/ 

a\ '{x 



(2) 



2 

(2) 



Zi^) ^ min[4!^]zf ) 



min[ 



-,(2) 



al^'ix) af\x) 



a^^\xy a\^>{x) 



(2), 



(2) 



.(2) 



^ (2) , . ' f2U J^l 



(2) ^;„r°^2 (a:)iv(2) 



^ min[ 



(3.4) 



For the general case, let Ail*) be the event (3.2) with fixed j ^ I*: 

Ain - {Vx e [0, f]« : (Xt, . . .,Xt+K-i){x) - Zf ■ • • > 4*-^i)(2^)}> 



i.e. A(P) is the event for a X-block starting a time i to be a block profile of 
type I*. Generalizing (3.4) shows that the event A{1*) is the intersection of 
{2K — 1)L — 1 conditions involving the random variables Zp'' for i — X + 1 ^ 
i ^ t + K — 1. Remembering that the density of Z is fz{z) — exp(— z^^), 
the probability oi A{1*) is 



-,('*)/ 



^ min[^°^]z) 



nP(Z<mm[4M,4M],) 



1=1 

L 



a^'_2(x) a^^_i(x) 



Yl P{Z min[ 



{x) 



1=1 



af{x) 



ag-i(^)- 



(3.5) 



Y[Piz 



«^^(^) aR-iix) 
a«(x) ' a^^\x) 



{in 



< min 



1=1 

L 



YIp{Z < min[ ^-^]\' ]z)z-'^exp{-z-^)dz 



1=1 



a«(x) 
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Denoting m^''' ■* the minimum written on line k of (3.5) and by extension 



rrSl the value of n'' ^ i 



(i*) _ 1 _ harm(w^^'' \ ...,mf^^_\) 



;=i ^k 
i^r if fc=x 



where harm is the harmonic mean of the {2K — 1)L minima. 

It is thus possible to compute p*-' ^ exactly given the parameter functions. If 
the parameter functions a^p satisfy 



1 <^<^hJ^^C^n (3.7) 



for all X, k, k' , I, I*, then the probability p'' ^ of success to reveal (oq a^ij^) 
by picking up a random block satisfies 

^ - C('*)(2if-1)L ^(2/f-l)L-^ • ^■^■^> 

The harmonic mean being more sensitive to small values, the lower bound is 
actually closer to the exact probability. 

Under the knowledge of K, the probability that a random ii'-block is a profile 
differs from pattern to pattern. But under the control condition (3.7), if all 
are themselves bounded above by a common constant C, the probability 
p = p^^-* + . . . + p'^-* for a random block to be any profile can be estimated by 

which has the remarkable property not to depend neither on L nor on the 
dimension of the ambient space. 

As an illustration. Table 1 shows the number of found block profiles found 
versus their expectations, knowing and without knowing the parameter functions 
for five simulations of (4.1). The different patterns are split in columns. 



3. 2. Correlation 

As we have seen in paragraph 3.1, we need {2K—1)L/ harm(r7ij^'' ™2K-i) 
independent random blocs from the series {Xt)t^z to expect that at least one 
is proportioned like the Z**^ profile. Practically, given a chain Xi, . . . ,Xt with 
T observations, we have T — K + 1 dependent blocks of length K. The main 
pieces of information about the dependence structure between these blocks can 
be summarized in the following way. 
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Simulation with parameters C = 


--5, D = 


20, K = 


5, L 


= 5 and T = 


5000 




Expected value (pC ^T) 


pT 


Really found 


#1 


#2 


#3 


#4 


#5 


total 


total 


total 


#1 


#2 


#3 


#4 


#5 


31 


31 


30 


34 


33 


159 


135 


147 


28 


20 


25 


30 


44 


31 


32 


33 


29 


36 


161 


135 


159 


30 


20 


43 


36 


30 


35 


35 


31 


31 


33 


166 


135 


166 


31 


35 


24 


30 


46 


32 


32 


36 


33 


33 


165 


135 


160 


33 


26 


43 


27 


31 


30 


30 


30 


31 


33 


154 


135 


143 


31 


28 


32 


19 


33 



Table 1 

Estimation of the number of block profiles versus values obtained in simulation. 



i) If a if-block {Xt, . . . , Xt+K-i) is a block profile, it does not overlap with 
another block profile. 

ii) Given that the if-block (Xt, . . . , Xt+K-i) is not a profile, the probability 
that one of the K — 1 next if -blocks (Xj+i, . . . , Xt^x), ... is a profile is 
higher. 

iii) If consecutive blocks are not profiles, the probability that the next one is 
a profile stops increasing after K non-profile blocks. 

To see i), for instance, have a look at the third matrix in (3.3). If {Xi, . . . , X3) 
is like the second profile, then in particular 

„(2)y(2) . 

But for (X3, . . . , X5) to be like the first profile, we must have 

Then (X3, . . . ,^5) cannot be proportioned like the first profile. Proceed simi- 
larly for any two non-disjoint blocks and any two different profiles. 

To see ii), if for instance {Xi, . . . ,^3) in (3.4) is not a profile, that means 
that at least one of the 15 inequalities is not satisfied, although we do not know 
precisely how many. Some of the reverse inequalities lie in the conditions for the 
K — 1 next blocks to be profile and some do not. To get the exact incidence, we 
need to condition on the number of inequalities not satisfied in (3.4) and whether 
or not they participate in the conditions for the next blocks to be profile. In any 
case: the probability that the next blocks are profile increases. 

To see iii). simply remark that the process is if-dependent. 



3.3. Sample size 

As a consequence of paragraph 3.2, the expected number of profiles of type I is 
greater than unity in a chain of length at least 

T^K-l + . (3.10) 

harm(m-j^ , . . . , 



T. Meinguet/CM3 processes 



10 



Given an upper bound C on all the C^' ^ in (3.7), the minimum sample size 
needed to expect at least M repetitions of a particular profile in the chain is 

T ^ K -1 + CM{2K -1) (3.11) 

if we do not know the parameter functions but only C. 



4. Estimation 

The estimation methodology for the parameter functions a,- starts from a dis- 
cretization at D points Xd {I ^ d ^ D) of the domain. CM3 processes from (3.1) 
are seen as a high-dimensional M4. Furthermore we may want to consider inde- 
pendent and normally distributed errors with variance at each measurement 
point. Thus the "practical model" studied in this section is 

Xt{d) = max ma^ 4'\xd)Z^/], + et{xd) [l ^ d ^ D) (4.1) 

where {Zt)t£Z are independent unit-Frechet, £t{xd) are independent N(0,a'^) 
random variables, the a[''\x) are positive continuous functions defined on [0, 1]^ 
and for every x G [0, 1]'' we have that X^jez Sz>o ~ ^■ 

It is important to note that the profiles (oq"''' , . . . , a^^_-^) contain not only 
the information about the shapes of the profile but also, according to 3.1, their 
probability of occurrence. The shapes will be denoted o^q and their frequencies 
of occurrence /^^^ . That is 

ap) = a^^^a!^ (4.2) 

where the coefficients a^^^ must be chosen so that /^^^ = p^-^^ in 3.1. 

The first step of the procedure is to estimate the length of the tail dependence 
K. This is done in 4.1 taking the average size of the clusters of exceedances 
over a threshold. Next blocks of extremes are selected to estimate number of 
patterns L, the shapes a'fQ of the parameter functions and their frequencies f^^K 
The algorithm to locate the blocks of extremes explained in 4.2 is based on a 
multivariate approach. The value of L is determined in 4.3 as the number of 
clusters among the chosen blocks of extremes. The functions a^g are yielded by 
the natural output (centroids, medoids, ...) of the partitioning algorithm used to 
determine L and the values /^■'•' through the size of the different clusters. Then 
the solution ap-* of (4.2) is obtained in 4.4 thanks to an iterative algorithm. 
To measure the quality of the estimation, the quantification of the dissimilarity 
between the the original parameter functions and their estimations is done in 
4.5 in terms of the HausdorfF distance. 



4-1. Estimation of the length of the tail dependence (K) 

Eight estimators of K have been tested for the model (4.1) with cr = when 
the knowledge of the parameter functions a~ is replaced by the range C given 
in (3.7). 
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The first step is to select the values considered as extremes. A first approach 
consist of working on (||Xt||oo) and choosing the values above the threshold 
maxt(||Xt||oo)/C. This will be referred as the scalar version. A second approach 
can be to use all the available information by doing the previous operation for 
the D components of {Xt) separately. In this last case the threshold also depends 
on the location. This method will be referred as the multivariate version. 

Once the extremes are selected, a runs declustering generates a sequence (s„) 
of all sizes of clusters of extremes found in the univariate or multivariate scan. 
More precisely, only contiguous extremes were considered here to make a cluster. 
This is the runs declustering with r = 0. 

From the sequence (s„), we estimate K through mean(s„), median(s„) or 
mode(sn) with the nomenclature as follow. 





Average 


Time series 


Threshold (scalar or vector) 




mean 


(ll^tllco) 


maxt((||Xt|U)/C^ 




mean 




(maxt(Xt(d))/C)i^d!£_D 




median 


(ll^tlloo) 


maxt(||Xt||oo)/C 


m 


median 




(maxt(Xt(d))/C)i^rf^_D 


Ko 


mode 


(ll^tllco) 


maxt(||Xt|U)/C 




mode 




(maxt(Xt(d))/C)i^d<5_D 



The ceil or floor options to get rid of the decimals are used to build the eight 
following estimators: 

= ceil(ir^), i:3 = ceil(if/^), k^=ciA\{Kra), k^ = ko, 
k2 - round(if^), ki = round(iff ), k^ = CGi\(K^), ks = k^. 

Figure 2 shows the success rate the eight estimators of K against the length 
of the simulated chain. The tests were performed with TV = 50000 trials at each 
step: for C from 1 to 10, D from 1 to 20, K from 1 to 5, L from 1 to 5, cr = 
and, for each of these parameters, 10 different sets of coefficients ap-* randomly 
generated (uniformly, without time or space correlation). 

According to these empirical results, the winner for T 35 is the univariate 
version and the mode as average cluster size. If T ^ 35 the best success rate is 
obtained with the multivariate version and the median. 

4-2. Extremal clustering 

Once we know the length K of the tail dependence thanks to 4.1, the next 
step of the procedure studied here to recover the parameter functions ap'' of 
a theoretical CM3 process (3.1) is to locate the blocks of extremes. Indeed, 
according to 3.2, the probability that at least one block of length K in the chain 
is block profile of type I in a sample of size T is greater than 

l-(l-p('))^-^+i 

which tends to 1 as T tends to infinity. 



T. Meinguet/CM3 processes 12 

1.00 
0.90 
0.80 
0.70 
0.60 
0.50 
0.40 
0.30 
0.20 
0.10 

10 20 50 100 500 1000 1500 2500 5000 7500 10000 
Fig 2. Proportion of success in estimating K. 



The suggested method locates the positions of blocks of extremes in the prac- 
tical model (4.1) maximizing the "likelihood" of being a multivariate extreme. 
This idea comes from the wish not to lose information across the D dimensions. 
Nevertheless a bad situation can still happen when, for some Q ^ i* < K , all 
af?{xd) are negligible in comparison with the ap''(xd), i ^ i* ^ for instance. If the 
points xi, . . . ,xd of the discretization are far enough to obtain independent-like 
patterns, it is unlikely that all the a^^?{xd) are negligible in the same time. 

We explain the method on the following example with D ^ 2 and K ^ 3: 





5 3 4 14 19 2 7 




6 1 10 5 2 1 4 



First step 

Using the order statistics, mark the K largest values in the D chains by 1. 



d = l 


1 1 1 


d = 2 


10 110 



Second step 

Compute the sum of the extremal status for each t. 



d = l 


1 1 1 


d^2 


10 110 


X 


10 12 10 1 



Third step 

Compute the moving sum (MS) of order K. This is considered as the likelihood 
A to have an large value at time t among the (Zj''')i^i^i. 



A 


10 12 10 1 


MS 


1 1 2 3 4 3 2 
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Extract the profile 

Find the index t that maximizes tire moving sum. Then 

{Xt~K+l, ■ ■ ■ , ^t)/l|-'^t--fS'+l||oo 

is the shape of tire first block profile to store in the memory: 



MS 


1 1 2 3 4 3 2 


Ml) 


5 3 4 14 19 2 7 




6 1 10 5 2 1 4 



Shapei(xi) = ( 4/10, 14/10, 19/10 
Shapei(x2) = ( 10/10, 5/10, 2/10 

To decide between multiple maxima, for instance if 



MS* 1 1 2 4 2 4 4 



(4.3) 



we first choose the single maximum. If there are consecutive maxima, as a sec- 



ond criterion, we take the block that maximizes ^ 



teblock 



|Xt||oo among those. 



Repeat this loop until having gathered the desired number Q of time-disjoint 
blocks of extremes of length K {Q = pT is the suggestion of 3.1 if we only know 
a uniform bound C on the variation of the parameter functions). 
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4.3. Estimation of the number of patterns (L) 

With Q blocks of extremes of length K normalized as in (4.3), the goal is to 
estimate the number of reproducible patterns L in the observed process. To do 
this, we create a. Q x i^D-table inside which each of the Q lines is made of the 
D temporal vectors of length K placed successively. We estimate L with the 
number of clusters for the observations of the table. 



Partitioning methods 

To break the lines of the table up into groups, we tried several algorithms 
among which five retained our attention: hierarchical clustering with Ward's ag- 
gregation criterion, hierarchical clustering with the Euclidean distance between 
the centroids [4, 21], fc-means with the Euclidean squared distance, fc-means 
with Pearson's correlation after standardization [11, 16] and finally Partitioning 
Around Medoids (PAM) with the classical Euclidean distance [19, 20[. 

Number of clusters 

For each of those algorithms we implemented two criteria to determine the 
number of clusters. The first: one stops when the percentage of the total variance 
not explained by the clustering is less than 20%, i.e. when 

SSE ^ Eciustcr(Qbobs (cluster) - 1) ^variable in cluster (variable) ^ 

SStot (0-l)EvaHablei„ table ^'(variable) ^ " ' 

We refer to this method as the elbow method [6, 8[ (see Figure 4). 




Fig 4. Elbow: With L = 5, perfect clustering on the left, CMS with errors on the right. 



The second method to find the number of clusters here is the first value that 
yields a total silhouette TtSil for the clustering above 85% of Q. Let a{q) be the 
average distance between the q^^ observation and the members of its own cluster. 
Then repeat this operation between the g*^ observation and the members of all 
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the other clusters, and set 6(g) to the lowest value found. The silhouette s(g) of 
the g*'' observation is 

^ b{q) - a{q) 
ma.x{a{q),b{q)}' 

Thus — 1 ^ s((7) ^ 1 and s{q) measures how dissimilar the q*"^ observation is to 
its own cluster [5, 7, 10]. The distance taken into account here is the Euclidean 
squared distance. We stop the partitioning at the smallest number of clusters 
satisfying 

TtSil 1 , , „ 
^ = -g.(.)^0.85 

if this occurs. We refer to this method as the silhouette method (see Figure 5). 



TtSil 

Q c 



TtSil 

Q 




Number of clusters 



Number of clusters 



Fig 5. Silhouette: Perfect clustering on the left, CM3 with errors on the right (L = 5). 



Both methods are unable to detect that L = 1. We have thus considered 
L — 1 when the estimated variance of each variable is less than 0.005. 

Estimation 

For (7 = 0, Figure G shows the success rate of the following eleven estimators 
of L against the length of the simulated chain. 
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Clustering 


Distance 


Number of clusters 


^1 




Ward 
V V dm 


J_ji UU W 


T,r. 


1— I i^iT'QT^rn^Q 1 


VV di (1 


OllliU Ue LLC 


^3 


Hierarchical 


Euclidean / centroids 


Elbow 


Li 


Hierarchical 


Euclidean / centroids 


Silhouette 


L5 


/c- means 


Euclidean squared 


Elbow 


Le 


/c- means 


Euclidean squared 


Silhouette 


L7 


/c- means 


Pearson's correlation 


Elbow 


Ls 


/c- means 


Pearson's correlation 


Silhouette 


L9 


PAM 


Euclidean 


Elbow 


Lio 


PAM 


Euclidean 


Silhouette 


ill 


Consensus: Ln = mode(Li, . . . , Lio) 



The tests were performed with N — 6000 trials at each step: for C in {2, 4, 6, 8, 10}, 
D in {1, 5, 10, 15, 20}, K from 2 to 5 and L from 1 to 5. The number of blocks 
of extremes chosen to build the table is Q = min(ceil(pT), 100) with p from 
3.1. We added the constraint that it has to be possible to see each profile once, 
i.e. (K + 1)L ^ T, to exclude challenges such as finding 5 profiles of length 5 in 
a chain of length 10. 

1.00 
0.90 
0.80 
0.70 
0.60 
0.50 
0.40 
0.30 
0.20 
0.10 

10 20 50 100 500 1000 1500 2500 5000 7500 10000 
Fig 6. Proportion of success in estimating L. 

The combination between PAM and elbow yields the best success rate for 
small sample sizes T ^ 50. For T ^ 50 Ward's algorithm with the silhouette is 
the best. Here the consensus curve does not provide a better performance. 

The gap between T = 20 and T = 500 is an intermediate area between a 
situation where the trivial case Q = L frequently occurs and the apparition of 
asymptotic properties. 

These results are not so excellent but, since the implemented algorithm is 
the "eye" of the analyst, it only sees what we want it to see. For real appli- 
cations it probably does not matter if not very frequent profiles are missed. 
Moreover optimizing C, Q and the thresholds for a precise sample can improve 
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the performances for that particular situation. 



(7) 

4-4- Recovering the parameter functions al 

Once wet got the length K of the tail dependence in 4.1 and the number L of 
different patterns in 4.3, it remains to estimate the parameter functions ap-* (0 ^ 
i < K, 1 ^ j ^ L). Depending on the partitioning method to estimate L, we 
choose for the shapes a^g of the L profiles the natural output of the algorithm: 
the centroids for the hierarchical clustering and fc-means and the medoids with 
PAM. Essentially the difference is that in the first case the estimator of a shape 
is a mean of observations and in the second case a median observation. 

The relationship between the parameter functions , the shapes q and 
their frequencies of occurrence is given by (3.6). The theoretical probabilities 
of occurrence of the different profiles must match the empirical frequencies 
(/•^^^ . . . , Z^^-*) in the table of 4.3. Thus the last step is to normalize the a'f^ to 
obtain ap' such that on the one hand 

harm [{ml^''^)u) / f^'^ = ...= harm ((m^'^^)),,) //(^) (4.4) 

(1 ^ fc ^ 2K — 1, 1 ^ I ^ L) and on the other hand 

5]5]#(x,) = l (l^d^D). (4.5) 

We seek a relation of the type (1 ^ I ^ L) which is a problem of 

rank L. The solution can be obtained iteratively: if a^l is the n'^ update of apQ 

and {pn \ ■ ■ ■ iPn^) the probabilities given by (3.6) at step n, then define af\j^i 
by 

Pn' 

and stop when the error \ {,f^^\ . . . , Z^'^-') — (pi+i, • ■ • ,pi+\)|oo is small. The iter- 
ative algorithm (4.6), if it converges, converges to the solution of (4.4). Indeed, 
let 

/ (1:'*) (1:'*) 



Mi'*) 



(L:l*) (L:l* 



be the n}^ update of the initial collection of numbers (m^.'^ '^)k,i given the a^''^. 
Define 
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n 





1 


2 


3 


4 


5 


6 


7 


8 




5.95 
2.62 


5.95 
4.70 


5.95 
4.09 


5.95 
4.23 


5.95 
4.19 


5.95 
4.20 


5.95 
4.20 


5.95 
4.20 


5.95 
4.20 




3.64 


5.25 


4.85 


4.94 


4.91 


4.92 


4.92 


4.92 


4.92 




6.03 
7.11 


3.36 
7.11 


3.86 
7.11 


3.74 
7.11 


3.77 
7.11 


3.76 
7.11 


3.76 
7.11 


3.76 
7.11 


3.76 
7.11 


(2) 


6.53 


4.57 


5.01 


4.90 


4.93 


4.92 


4.92 


4.92 


4.92 



Table 2 

Realization of (4.6) with K = 1, L = 2 and /i = /2 = 1, given My"'' and 



as the resulting operation of (4.6) on the 
mI' becomes 



, (1:1*) 

n+1 — "^l.n 

. {L:l*) 



The operator T acts so that 



M. 



''2K- 



-l,n) 



(;*:;*) 

^2K-l.i 



(L:l*) 
'^2K-1. 



Observe that (4.4) is equivalent to T{M^'^\ M^^)) = {M^^\ M^^'>). Con- 
sequently, if iMi^\. . . , M,1^V 

converges, since T is continuous, (4.6) converges 
to the solution of (4.4). The convergence seems generally fast as shows the 
example of Table 2. This yields 

/(O fii) fii) 



(i) (i) (0 
Pq Pi Pi 



(1 < K L). 



(4.7) 



where any a > suits to obtain (4.4). To simultaneously obtain (4.5), we know 
by 3.1 that the solution of (4.4) and (4. .5) together exists and is unique, thus, as 
n — >■ cx) the value of X^jez Si>o ^i'ni-'^) cannot depend on x. Although, because 
of numerical reasons, it may slightly vary with x. Thus we suggest to keep the 
dependence in x and replace a in (4.7) by 

1 



a{x) = lim 



SiGZ J2i>0 ^i^li^) 



in (4.7). This completes the procedure to estimate the a^^ form observations. 

Figure 7 shows an output of the full algorithm using PAM, with measurement 
errors but given the true values of K and L. 



4-5. Distance between two sets of parameter functions 

To measure the quality of the estimation, it is necessary to quantify the dissimi- 
larity between the the original parameter functions a^"'"' (0 ^ i < i^T, 1 ^ j ^ i) 



T. Meinguet/CM3 processes 



19 




T = 500 



Fig 7. Recovery of the parameter junctions with D = 20, K = 2, L = 'i, a=l and PAM. 



and their estimations cif' {Q^i<K.l^j^L).liK^K the estimation can 
certainly be quahfied as bad, so that only the case K — K requires a discussion. 

The order in which the different patterns are retrieved can change and their 
total numbers can differ. Consequently the Hausdorff distance between the L 
graphs of {af\x))i (1 ^ J ^ L) and the L graphs of {a:p {x))^ {1 ^ j ^ L), 
that are compact in [0, 1]"? x (or in {1, . . . , D} xR^ ^ R^^ for the discrete 
version), perfectly suits. Recall that the Hausdorff distance between nonempty 
compacts A and B is 

dj^{A, B) — maxjsup d{a, B), sup d{b, A)}, 

aeA beb 

here considered with the Euclidean distance. We have thus to compute d,^{A, B) 
with 

L L 

^ = y graph[(a,p^(a;))o^i<i^] and B = |J graph[(a|-'^ (x))o^i<K] 



to reach the stated goal. 
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To illustrate the procedure developed in 4.4, Figure 8 shows smoothed his- 
tograms of the distances from 4.5 for the estimation of the parameter functions 
with sample sizes T = 100, 500, 1000, 5000 given the true values of K and L. 
The test was performed with N — 5000 trials for each sample size: C running 
in {2,4,6,8,10}, D in {1,5,10,15,20}, K from 2 to 5, i from 1 to 5 with 10 
repetitions of each. The other parameters were Q = min(ceil(pT), 100) with p 
from 3.1 and a — 1. 




,1 , , 1 , , , , , , 1 

0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 



Fig 8. Smoothed histograms of the Hausdorff distance between the original parameter func- 
tions (ap' (x))o^i<A' ft ^ j ^ L) and their estimations (ap' (a;))o^i<if (1 ^ j ^ L) . 



5. Conclusion 

After the linear processes in function spaces with heavy tailed innovations [9], 
CMS processes are other examples of jointly regularly varying time series in 
function spaces. Under (2.2) they enjoy the finite-cluster condition and under 
(2.3) the strong mixing property. 

Further studies could be to determine whether or not the approximation 
theorem of Deheuvels for M3 [3] and of Smith and Weissman for M4 [15] also hold 
for CM3, that is to say if max-stable processes in function spaces, excluding the 
ones containing a deterministic component, can be arbitrary closely approached 
by a CM3. From these papers also arises the question of a generalization of the 
multivariate extremal index. Since such an object becomes hard to figure out in 
function spaces, it has maybe to be replaced by the spectral process. 

About the estimation, in the empirical study on CM3, we saw that the mode 
correctly estimates more frequently the length of the tail dependence than the 
median and the mean. For finding the number of patterns, this study revealed 
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the importance of simulating the behavior of the chosen method before using it 
on real data. 
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Appendix A: Proofs of Section 2 

Proof of Proposition 2.1. Claim 1- Xt is a random element in "^([0, 1]'^, 

Given t G Z, write X ^ Xt and X^™) = sup, |j|^„, ap^Zj^i^ For z > 0, since the 

(i) 

Zf are iid unit-Frechet, 

P(||X||oo ^ = P(Vz > 0,Vj £ Z : Z.^i'^ s; z/||ap^||oo) 

= n '3xp(-||a|^^||oo/z) =exp(-^^.g2;E^^oll"?■'lloo/^)■ 
iez 

If X^jez Si>o lki"'''lloo = cxD, then this probabihty is zero for all z > 0, hence 
||X||oo = oo with probability one. 

IfE.aEoo II^P^IU < ^,thenX(™) ^ X < X('")+sup,^|^.|>„ ^c^Ho.z'^^ 
and thus 1|X ^ sup,^|^.|>„ ||ap'||oo^H- Writing 

= sup ||a|-'''||oo2'j, 

2, |j I >m 

we have y,„ ^ and is decreasing in m. By a similar computation as above, 
we find that for y > 0, 

P{Ym ^y)^ exp(-E.,b|>„J|a?^lloo/y). 

As a consequence, — > in probability and, by monotonicity, almost surely. 
Since the uniform limit of a sequence of continuous functions is continuous, by 
monotonicity, X is continuous with probability one. 

Then the fact that, for every t ^ "L, the map w i— > Xt(cj, •) with values in 
■^([0, 1]*,]R+) is measurable follows easily. 

Claim 2 - {Xt)t£Z is a stationary time series. In extension, this property is 
that, for every n ^ 0, every h ^ and every measurable set A C '^{[0, 1]"^, K''), 

P((Xo, . . . , x„) e ^) = . . . , x„+,0 e A). 

The argument is based on two facts: 

1) It suffices to pick up a; £ [0, 1]^, zq, . . . , z„ G M and verify that 

P{Xq{x) ^ zo, . . ■,Xn{x) ^ z„) ^ zo, . . . ,X„+,,(a;) < z„), 

the case with k points xi, . . . G [0, 1]'' being similar. 
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2) The right hand-side of the expression in 1) admits the expansion 



= P(supsupa[^'^(x)Z^!^, < ^0, . . . , supsupap'(a;)Z,'/]^_, ^ z„) 



Vj„ e Z,Vi„ ^ : a?„"'(a;)Z^^;L,^ < z„) 



Zii^^, < min ' 



Zn-l 



4+n-2 < min 



-^n — 1 -^n — 2 



^ min 



Zi'li < min 

















Zn-l 






)'a^f')(x)'' 





which allows us to dispose of the variable h by independence. 

Proof of Proposition 2.3. Given s,t ^ consider the Dirac masses in 





( 0^.. 


"■t+i \ 




ll|a?'lU'---' 


IkP^lloo/ 


d 


\ ( ^'l.^ 






[Vll«?'^^lloo' 


■"llaP^^II 



l|a?^lloo^>a: 



with i ^ and j G Z. 

A slight variation in the proof of [9, Lemma B.2] using the identity 



l{max(||Xi||,||X2||) >x} = 

l{||Xi|| >x} + 1{\\X2\\ >x}~ l{\\Xi\\ > a:}l{||X2|| > x} 



and considering tI^\z) = afz leads to the new conclusion that 



E 



Hsupsupllap^llz!^) >x}- lill^l'^ll^ > ^)} 



o[P{Z > x)), X -¥ oo. 

(A.l) 
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E 



f 



E 



X 



ll^olloo > X 



/(^■■■■^f)l{|l^0||oo>^}] P(Z 



> x) 



E 



P{Z > X 



P{\\Xo\\o.>x) 
|t) l{supsup||a,(^'^||oo^i'^ >a;} 



P{Z > x) 



P{Z > x) 



P{\\Xo\\oo>x) 



E 

i>0 



E 



Xt 



0)ii vU) 



P(||a?^||oo^!f]>x) 

P(||ap^|UZ_, >x) P{Z>x) 
P{Z>x) P(||Xo||oo >2:) 



o(l). 



Thanks to the dominated convergence and the continuity of /, the last expres- 
sion is equal to 



i>0 



( supsup(4^Zi^_J supsup(4^Z}J^.) 



E^ 



E^ 

i>0 



PjWalf^W^Z > x) P{Z>x) 
P{Z>x) P{\\Xo\\^>x) 



0(1) 



\a}'h\^Z>x 



P{\\af^\\ooZ > x) P{Z>x) 



/(e(a(f]+„...,a 



(i) 



P(Z > a;) 

\a9\\ooZ > X 



P(||ap'^|UZ>x) P{Z>x) 
P{Z>x) P{\\Xo\\oo>x) 



P{\\Xo\\oo>x) 



0(1) 



0(1). 



Using the continuous mapping theorem and the regular variation of Z, the last 
term converges to 



E[fiYS{-s,t;i,j))] \\a. 



1 



i>0 



where Y ^ Pareto(l). The factors 
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form a probability distribution on Z-|_ x Z, let us say of a random vector S. 

According to [9, Theorem 3.1 (iv)], the spectral process of the CMS process 
{Xt)tez is 

which has the announced distribution. 

Proof of Proposition 2.4. It suffices to check that 

lim limsupP(3t e {2m, . . . ,r„} : \\Xt\\ > n \ \\Xo\\ > n) = 0. 

For the convenience of the proof, set p — t ~ i so that 



Xt — supsupcji^ Z^-'^ 

Let h^p := ||a|-'^||oo to have 



llXtll =supsup6ii^pZ«. 

Given m > 0, write A„ := {||-^o|| > ^} so that 

A„ - {3j G Z, 3p ^ : > n}. 

We have, as n ^> oo, 

nP(A„) 

= n\\ - P(Vj G Z,Vp ^ : n)\ 
= n[l-exp(-iE,,zEp^oO 

For TO > 0, decompose {3i G {2to, . . . , r„} : j| > rt} = C'm,„U£'m,„ where 

Cra,^ = {3t G {2m, . . . , r„}, 3j G Z, 3p > to : \}il^7^ > n}, 
Dm,n = {3t G {2m, . . . , r„}, 3j eZ,3ps^m: b^^l^^Z^ > n}. 

We have 

PiCm,n) = 1 - P(Vi G {2to, . . . , r„}, Vj G Z, Vp > m : bji^pZ^^) n) 
= 1 - P(Vj G Z, Vp > TO : [ max 6|i^„]Z(j') < n) 

= 1 — cxp( max bi-'}^) 



j£Z p>m 

^ - V V max bi^^ 
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as n — >■ cx). 

Setting p = 2m — q write 

An = {3j G Z, 2m : b['l^^Z^^^ > n} 

and 



hence 



Dm,n = {3i eZ,3q^m: max b['^ Z\^^ > n} 



+"E E J'(&22™^^^'^ >n)P(i?™,„). 

The first term of the sum is bounded above by 

^^min[6(^),max6«)] 

and the second term tends to as n 00 since 

"E E nh%mZ\^^ >n)<Y.Il < ^ 

and 

P(D^ ^ = iyy max ^ !^ ^ ^ foO) ^ q. 



Consequently 

hmsupnP(A„ n < E E ""iil^p ^ '^^.x 6^^]. 



Since An and Cm,„ are independent, 

P{3t€{2m,...,rn}:\\Xt\\>n \ ||Xo|| > n) 

— P{Cm,n U Dm,n \ An) 

P{{C'm,n U Dm,n] ^ ^m,n) 

P([C„,„nA„]u[D nA„]}) 

P{An) 

nAn) 



< P{Cm,n) + p^^^^ 
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SO that 



limsupP(3t G {2m,...,r„} : \\Xt\\ >n \ \\Xo\\ > n) 

n—¥oo 

1 



which tends to if m — oo. 



> > min[6i-'\ max 6. 



Proof of Proposition 2.5. Again set p = t — i to have 



Xt = supsupa|i^ Z^^^ 



For large t ^ 0, we wiU prove that we can approach Xt by 

X[+^ = sup sup aii^pZ(^') 

in the sense of 

lim P(Vt ^ m : X^^^^ ^ Xt) = l. 

Then the conclusion wih follow from the fact that the processes (Xi+')t^i and 
{Xt)t^-i are independent. 

As the Borel cr-field on "^([0, 1]') is generated by the finite dimensional sets, 
it is sufficient to check that the limit is 1 on every subset {xi, . . . , a;^} C [0, 1]'. 
Passing to the complementary, compute 

:= P{3t^m,3li^ii^k : X'i'^\x,) <Xt{x^)) 

= P(3t ^ m,31 ^ i ^ fc : sup sup a[^l {xi)Z^^^ < 

supjgz supp^t a'f\{xi)Z^^'') 
= P{3t > m,31 i fc : sup sup a[^Jxi)Z^^^ < 

supjgz supp^o 4^-pixi)zi^^)- 
If a, 6 > and Zi, Z2 are standard unit-Frechet, as in (3.6), 

1 



P{aZi < = 



b 

1 + f ^ a 



so that 



k 



EE 



According to (2.2), Dt{x) uniformly converges to a continuous function D{x) 
as t — > 00. Without loss of generality, we can assume that D{t) = 1. Write 
Dt{x) = 1 — £t{x) with ||ef lloo — ^ as t — 00. Consequently, 



1 

Dt{x) 



= 1 



£t{x) 
Dt{x) 
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hence 

Next, use the fact that for every > 0, there exists such that for every 
t ^ and every 1 ^ i ^ fc 



to see that 



EE^*(^o 



£tixi) 



~k "(^^ 

EE^*(^^) 



and so obtain 



Since 



t^m i=l jSZ p^O 

E E E E = E E E(^ - ™ + 1)4^' (-^)' 

t^m i=l jGZ p^O fc^Tji 1=1 jeZ 

thanks to (2.3) we have that pm vanishes as m — )■ oo, which proves the resuU. 
Proof of Proposition 2.6. Since 

ll^tlloo = ||supsupap^Z,^i^Joo =supsup||ap'^||oo^B 

it suffices to transcribe the formula for a M3 from [f5]. 
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